directional derivative
Scaling Gaussian Processes with Derivative Information Using Variational Inference
Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating O(N3D3) computational cost when training on N points in D input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-Dsetting, the high-N, high-Dsetting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size N nor the full dimensionality D. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.
Asymptotic Theory for Graphical SLOPE: Precision Estimation and Pattern Convergence
Hejnรฝ, Ivan, Bonaccolto, Giovanni, Kremer, Philipp, Paterlini, Sandra, Bogdan, Maลgorzata, Wallin, Jonas
This paper studies Graphical SLOPE for precision matrix estimation, with emphasis on its ability to recover both sparsity and clusters of edges with equal or similar strength. In a fixed-dimensional regime, we establish that the root-$n$ scaled estimation error converges to the unique minimizer of a strictly convex optimization problem defined through the directional derivative of the SLOPE penalty. We also establish convergence of the induced SLOPE pattern, thereby obtaining an asymptotic characterization of the clustering structure selected by the estimator. A comparison with GLASSO shows that the grouping property of SLOPE can substantially improve estimation accuracy when the precision matrix exhibits structured edge patterns. To assess the effect of departures from Gaussianity, we then analyze Gaussian-loss precision matrix estimation under elliptical distributions. In this setting, we derive the limiting distribution and quantify the inflation in variability induced by heavy tails relative to the Gaussian benchmark. We also study TSLOPE, based on the multivariate $t$-loss, and derive its limiting distribution. The results show that TSLOPE offers clear advantages over GSLOPE under heavy-tailed data-generating mechanisms. Simulation evidence suggests that these qualitative conclusions persist in high-dimensional settings, and an empirical application shows that SLOPE-based estimators, especially TSLOPE, can uncover economically meaningful clustered dependence structures.